PCA Demo¶

This version uses plotly.

In [1]:
# Standard imports
import numpy as np

#from dash import Dash, html, dash_table, dcc, callback, Output, Input
#import plotly.graph_objects as go
#from jupyter_dash import JupyterDash
#import pandas as pd
import plotly.express as px
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.

Create a data cloud¶

In [2]:
y1 = np.array([4, 2, 3], dtype=float)
y1 /= np.linalg.norm(y1)
y2 = np.array([-1, 1, 0], dtype=float)
y2 /= np.linalg.norm(y2)
In [3]:
n_points = 500
In [4]:
A = []
for k in range(n_points):
    x = 2.*np.random.randn()*y1 + np.random.randn()*y2
    A.append( x + np.random.rand(3) )
A = np.array(A)
In [5]:
np.shape(A)
Out[5]:
(500, 3)

View Data Cloud in 3-D¶

In [6]:
fig = px.scatter_3d(x=A[:,0], y=A[:,1], z=A[:,2],
                    title='The cloud is like an elongated pancake')
fig.update_traces(marker_size=3)
fig.show()

SVD of data matrix¶

In [7]:
U, S, VT = np.linalg.svd(A, full_matrices=False)
In [8]:
U.T @ U
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Out[8]:
array([[ 1.00000000e+00,  1.94289029e-16, -1.90819582e-17],
       [ 1.94289029e-16,  1.00000000e+00, -1.04083409e-17],
       [-1.90819582e-17, -1.04083409e-17,  1.00000000e+00]])
In [9]:
VT @ VT.T
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Out[9]:
array([[ 1.00000000e+00, -1.24900090e-16, -5.55111512e-17],
       [-1.24900090e-16,  1.00000000e+00,  0.00000000e+00],
       [-5.55111512e-17,  0.00000000e+00,  1.00000000e+00]])
In [10]:
S
Out[10]:
array([49.24013109, 22.31148657,  6.36140257])
In [11]:
np.diag(S)
Out[11]:
array([[49.24013109,  0.        ,  0.        ],
       [ 0.        , 22.31148657,  0.        ],
       [ 0.        ,  0.        ,  6.36140257]])

Low-Rank Approximation¶

In [12]:
k = 2

from copy import deepcopy
Sk = deepcopy(S)
Sk[2] = 0.

Ak = U @ np.diag(Sk) @ VT

3D (flattened) Data Cloud¶

In [14]:
fig = px.scatter_3d(x=Ak[:,0], y=Ak[:,1], z=Ak[:,2], title='Looking at the subspace end-on')
fig.update_traces(marker_size=2)
fig.show()

Express data in terms of Principal Components¶

In [15]:
# Project points into 2D using ...
C = A @ VT[:2,:].T
In [16]:
# Or ...
C = U @ np.diag(Sk)
In [17]:
fig = px.scatter(x=C[:,0], y=C[:,1])
fig.update_traces(marker_size=5)
fig.show()
In [ ]: